169 research outputs found
FunTree: a resource for exploring the functional evolution of structurally defined enzyme superfamilies
FunTree is a new resource that brings together sequence, structure, phylogenetic, chemical and mechanistic information for structurally defined enzyme superfamilies. Gathering together this range of data into a single resource allows the investigation of how novel enzyme functions have evolved within a structurally defined superfamily as well as providing a means to analyse trends across many superfamilies. This is done not only within the context of an enzyme's sequence and structure but also the relationships of their reactions. Developed in tandem with the CATH database, it currently comprises 276 superfamilies covering ∼1800 (70%) of sequence assigned enzyme reactions. Central to the resource are phylogenetic trees generated from structurally informed multiple sequence alignments using both domain structural alignments supplemented with domain sequences and whole sequence alignments based on commonality of multi-domain architectures. These trees are decorated with functional annotations such as metabolite similarity as well as annotations from manually curated resources such the catalytic site atlas and MACiE for enzyme mechanisms. The resource is freely available through a web interface: www.ebi.ac.uk/thorton-srv/databases/FunTree
Composite structural motifs of binding sites for delineating biological functions of proteins
Most biological processes are described as a series of interactions between
proteins and other molecules, and interactions are in turn described in terms
of atomic structures. To annotate protein functions as sets of interaction
states at atomic resolution, and thereby to better understand the relation
between protein interactions and biological functions, we conducted exhaustive
all-against-all atomic structure comparisons of all known binding sites for
ligands including small molecules, proteins and nucleic acids, and identified
recurring elementary motifs. By integrating the elementary motifs associated
with each subunit, we defined composite motifs which represent
context-dependent combinations of elementary motifs. It is demonstrated that
function similarity can be better inferred from composite motif similarity
compared to the similarity of protein sequences or of individual binding sites.
By integrating the composite motifs associated with each protein function, we
define meta-composite motifs each of which is regarded as a time-independent
diagrammatic representation of a biological process. It is shown that
meta-composite motifs provide richer annotations of biological processes than
sequence clusters. The present results serve as a basis for bridging atomic
structures to higher-order biological phenomena by classification and
integration of binding site structures.Comment: 34 pages, 7 figure
The role of viral genomics in understanding COVID-19 outbreaks in long-term care facilities
We reviewed all genomic epidemiology studies on COVID-19 in long-term care facilities (LTCFs) that had been published to date. We found that staff and residents were usually infected with identical, or near identical, SARS-CoV-2 genomes. Outbreaks usually involved one predominant cluster, and the same lineages persisted in LTCFs despite infection control measures. Outbreaks were most commonly due to single or few introductions followed by a spread rather than a series of seeding events from the community into LTCFs. The sequencing of samples taken consecutively from the same individuals at the same facilities showed the persistence of the same genome sequence, indicating that the sequencing technique was robust over time. When combined with local epidemiology, genomics allowed probable transmission sources to be better characterised. The transmission between LTCFs was detected in multiple studies. The mortality rate among residents was high in all facilities, regardless of the lineage. Bioinformatics methods were inadequate in a third of the studies reviewed, and reproducing the analyses was difficult because sequencing data were not available in many facilities
FLORA: a novel method to predict protein function from structure in diverse superfamilies
Predicting protein function from structure remains an active area of interest, particularly for the structural genomics initiatives where a substantial number of structures are initially solved with little or no functional characterisation. Although global structure comparison methods can be used to transfer functional annotations, the relationship between fold and function is complex, particularly in functionally diverse superfamilies that have evolved through different secondary structure embellishments to a common structural core. The majority of prediction algorithms employ local templates built on known or predicted functional residues. Here, we present a novel method (FLORA) that automatically generates structural motifs associated with different functional sub-families (FSGs) within functionally diverse domain superfamilies. Templates are created purely on the basis of their specificity for a given FSG, and the method makes no prior prediction of functional sites, nor assumes specific physico-chemical properties of residues. FLORA is able to accurately discriminate between homologous domains with different functions and substantially outperforms (a 2–3 fold increase in coverage at low error rates) popular structure comparison methods and a leading function prediction method. We benchmark FLORA on a large data set of enzyme superfamilies from all three major protein classes (α, β, αβ) and demonstrate the functional relevance of the motifs it identifies. We also provide novel predictions of enzymatic activity for a large number of structures solved by the Protein Structure Initiative. Overall, we show that FLORA is able to effectively detect functionally similar protein domain structures by purely using patterns of structural conservation of all residues
Spatial growth rate of emerging SARS-CoV-2 lineages in England, September 2020–December 2021
This paper uses a robust method of spatial epidemiological analysis to assess the spatial growth rate of multiple lineages of SARS-CoV-2 in the local authority areas of England, September 2020-December 2021. Using the genomic surveillance records of the COVID-19 Genomics UK (COG-UK) Consortium, the analysis identifies a substantial (7.6-fold) difference in the average rate of spatial growth of 37 sample lineages, from the slowest (Delta AY.4.3) to the fastest (Omicron BA.1). Spatial growth of the Omicron (B.1.1.529 and BA) variant was found to be 2.81× faster than the Delta (B.1.617.2 and AY) variant and 3.76× faster than the Alpha (B.1.1.7 and Q) variant. In addition to AY.4.2 (a designated variant under investigation, VUI-21OCT-01), three Delta sublineages (AY.43, AY.98 and AY.120) were found to display a statistically faster rate of spatial growth than the parent lineage and would seem to merit further investigation. We suggest that the monitoring of spatial growth rates is a potentially valuable adjunct to outbreak response procedures for emerging SARS-CoV-2 variants in a defined population
DODO: an efficient orthologous genes assignment tool based on domain architectures. Domain based ortholog detection
<p>Abstract</p> <p>Background</p> <p>Orthologs are genes derived from the same ancestor gene loci after speciation events. Orthologous proteins usually have similar sequences and perform comparable biological functions. Therefore, ortholog identification is useful in annotations of newly sequenced genomes. With rapidly increasing number of sequenced genomes, constructing or updating ortholog relationship between all genomes requires lots of effort and computation time. In addition, elucidating ortholog relationships between distantly related genomes is challenging because of the lower sequence similarity. Therefore, an efficient ortholog detection method that can deal with large number of distantly related genomes is desired.</p> <p>Results</p> <p>An efficient ortholog detection pipeline DODO (DOmain based Detection of Orthologs) is created on the basis of domain architectures in this study. Supported by domain composition, which usually directly related with protein function, DODO could facilitate orthologs detection across distantly related genomes. DODO works in two main steps. Starting from domain information, it first assigns protein groups according to their domain architectures and further identifies orthologs within those groups with much reduced complexity. Here DODO is shown to detect orthologs between two genomes in considerably shorter period of time than traditional methods of reciprocal best hits and it is more significant when analyzed a large number of genomes. The output results of DODO are highly comparable with other known ortholog databases.</p> <p>Conclusions</p> <p>DODO provides a new efficient pipeline for detection of orthologs in a large number of genomes. In addition, a database established with DODO is also easier to maintain and could be updated relatively effortlessly. The pipeline of DODO could be downloaded from <url>http://140.109.42.19:16080/dodo_web/home.htm</url></p
Emergence and maintenance of actionable genetic drivers at medulloblastoma relapse
BACKGROUND: 90% of tumors) and established genetic drivers (e.g. SHH/WNT/P53 mutations; 60% of rMB events) were maintained from diagnosis. Critically, acquired and maintained rMB events converged on targetable pathways which were significantly enriched at relapse (e.g. DNA damage-signaling) and specific events (e.g. 3p loss) predicted survival post-relapse. CONCLUSIONS: rMB is defined by the emergence of novel events and pathways, in concert with selective maintenance of established genetic drivers. Together, these define the actionable genetic landscape of rMB and provide a basis for improved clinical management and development of stratified therapeutics, across disease-course
CLIMB-COVID: continuous integration supporting decentralised sequencing for SARS-CoV-2 genomic surveillance.
Funder: Wellcome TrustIn response to the ongoing SARS-CoV-2 pandemic in the UK, the COVID-19 Genomics UK (COG-UK) consortium was formed to rapidly sequence SARS-CoV-2 genomes as part of a national-scale genomic surveillance strategy. The network consists of universities, academic institutes, regional sequencing centres and the four UK Public Health Agencies. We describe the development and deployment of CLIMB-COVID, an encompassing digital infrastructure to address the challenge of collecting and integrating both genomic sequencing data and sample-associated metadata produced across the COG-UK network
Combinatorial Clustering of Residue Position Subsets Predicts Inhibitor Affinity across the Human Kinome
The protein kinases are a large family of enzymes that play fundamental roles in propagating signals within the cell. Because
of the high degree of binding site similarity shared among protein kinases, designing drug compounds with high specificity
among the kinases has proven difficult. However, computational approaches to comparing the 3-dimensional geometry and
physicochemical properties of key binding site residue positions have been shown to be informative of inhibitor selectivity.
The Combinatorial Clustering Of Residue Position Subsets (CCORPS) method, introduced here, provides a semi-supervised
learning approach for identifying structural features that are correlated with a given set of annotation labels. Here, CCORPS is
applied to the problem of identifying structural features of the kinase ATP binding site that are informative of inhibitor
binding. CCORPS is demonstrated to make perfect or near-perfect predictions for the binding affinity profile of 8 of the 38
kinase inhibitors studied, while only having overall poor predictive ability for 1 of the 38 compounds. Additionally, CCORPS is
shown to identify shared structural features across phylogenetically diverse groups of kinases that are correlated with
binding affinity for particular inhibitors; such instances of structural similarity among phylogenetically diverse kinases are
also shown to not be rare among kinases. Finally, these function-specific structural features may serve as potential starting
points for the development of highly specific kinase inhibitors
- …